Linguistics isn't always the answer: Word comparison in computational linguistics

نویسنده

  • Lars Borin
چکیده

S trin g s im ilarity m e tric s are im p o rtan t to o ls in co m p u ta tio n a l lin g u is tic s , ex ten siv e ly u sed e.g . fo r co m p arin g w o rd s in a v a rie ty o f p ro b lem dom ains. T h is p ap e r ex am in es th e som etim es m ad e a ssu m p tio n th a t th e p e rfo rm an ce o f such w ord co m p ariso n m e th o d s w ou ld b en efit from th e use o f lin g u is tic , viz. p h o n o lo g ica l and m orpho log ical, k n o w led g e . O n e lin g u is tica lly naive m e th o d a n d o n e in co rp o ra tin g a m o d e ra te am oun t o f lin g u is tic so p h is tic a tio n w ere com pared o n a b ilin g u a l and a m o n o lin g u a l w o rd co m p ariso n ta sk fo r a ran g e o f languages. T he resu lts sh o w the p erfo rm an ce , m easu red as reca ll and p rec ision , o f the lin g u is tica lly n a iv e m eth o d to be su p erio r in a ll cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

Q&A: Already A Success?

When Prof. Wolfgang Wahlster (the organizer of this COLING-86 panel on "Natural Language Interfaces: Ready for Commercial Success?") sent out invitations to panelists, he stated that his goals were "to evaluate three natural language interfaces which were introduced to the commercial market in 1985 and to relate them to current research in computational linguistics." For comparison, he has aske...

متن کامل

Foundations of computational linguistics - human-computer communication in natural language (2. ed.)

Bring home now the book enPDFd foundations of computational linguistics human computer communication in natural language to be your sources when going to read. It can be your new collection to not only display in your racks but also be the one that can help you fining the best sources. As in common, book is the window to get in the world and you can open the world easily. These wise words are r...

متن کامل

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998